Goto

Collaborating Authors

 contextual rq-transformer


Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Neural Information Processing Systems

Although autoregressive models have achieved promising results on image generation, their unidirectional generation process prevents the resultant images from fully reflecting global contexts. To address the issue, we propose an effective image generation framework of \emph{Draft-and-Revise} with \emph{Contextual RQ-transformer} to consider global contexts during the generation process. As a generalized VQ-VAE, RQ-VAE first represents a high-resolution image as a sequence of discrete code stacks. After code stacks in the sequence are randomly masked, Contextual RQ-Transformer is trained to infill the masked code stacks based on the unmasked contexts of the image. Then, we propose the two-phase decoding, Draft-and-Revise, for Contextual RQ-Transformer to generates an image, while fully exploiting the global contexts of the image during the generation process.


Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer (Supplementary Material) A Implementation Details A.1 Details of RQ-V AE

Neural Information Processing Systems

In this section, we show additional examples of generated images by our Contextual RQ-Transformer. We use 1.4B parameters of Contextual RQ-Transformer trained on ImageNet for class-conditional



Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer (Supplementary Material) A Implementation Details A.1 Details of RQ-V AE

Neural Information Processing Systems

In this section, we show additional examples of generated images by our Contextual RQ-Transformer. We use 1.4B parameters of Contextual RQ-Transformer trained on ImageNet for class-conditional



Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Neural Information Processing Systems

Although autoregressive models have achieved promising results on image generation, their unidirectional generation process prevents the resultant images from fully reflecting global contexts. To address the issue, we propose an effective image generation framework of \emph{Draft-and-Revise} with \emph{Contextual RQ-transformer} to consider global contexts during the generation process. As a generalized VQ-VAE, RQ-VAE first represents a high-resolution image as a sequence of discrete code stacks. After code stacks in the sequence are randomly masked, Contextual RQ-Transformer is trained to infill the masked code stacks based on the unmasked contexts of the image. Then, we propose the two-phase decoding, Draft-and-Revise, for Contextual RQ-Transformer to generates an image, while fully exploiting the global contexts of the image during the generation process.